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Total variation on a tree 


Vladimir Kolmogorov* and Thomas Pock} and Michal Rolinek* 


Abstract 

We consider the problem of minimizing the continuous valued total variation subject 
to different unary terms on trees and propose fast direct algorithms based on dynamic 
programming to solve these problems. We treat both the convex and the non-convex case 
and derive worst case complexities that are equal or better than existing methods. We show 
applications to total variation based 2D image processing and computer vision problems 
based on a Lagrangian decomposition approach. The resulting algorithms are very efficient, 
offer a high degree of parallelism and come along with memory requirements which are only 
in the order of the number of image pixels. 


1 Introduction 

Consider the following problem: 

min/(a;), f(x) = ^ fi(xi) + ^ (1) 

i£V 

where (V,E) is a (directed) tree with n = \V\ nodes, unary terms fi : R —> M are continuous 
functions, and pairwise terms are given by 


fij(z) = min{u!jj • \z\,Cij} (2) 

with Wij > 0. This is known as a “truncated TV regularizer”; if Cij = +oo then it is called a 
“TV regularizer”. To simplify the presentation, we make the following assumptions: 

• Function f is bounded from below and attains a minimum at some point x € R". 

• All terms fi, are continuous piecewise-linear or piecewise-quadratic functions with a 
finite number of breakpoints. 

We will consider the following cases. 
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Non-convex case Here we will assume that unary terms fi are piecewise-linear functions with 
0(1) breakpoints (not necessarily convex). We will present a dynamic programming algorithm 
for minimizing function |l]) that works by passing messages , which are piecewise-linear functions 
M —> M. If Cij = +00 for all (i,j) £ E then we will prove that the number of breakpoints in each 
message is at most 0(n), leading to complexity 0(n 2 ). In the truncated TV case we do not have 
a polynomial bound. Our tests, however, indicate that the algorithm is efficient in practice. 

Convex case Next, we will consider the case when all unary and pairwise terms fi , f l:t are con¬ 
vex functions (which means that C\j = +00 for all (*, j) £ E). We will describe three algorithms: 
(i) 0{n) algorithm for quadratic unaries on a chain, (ii) O(nlogn) algorithm for piecewise-linear 
or piecewise-quadratic unaries on a tree, (iii) Oin log log n) algorithm for piecewise-linear unaries 
on a chain. In the last two cases we assume that the number of breakpoints in each term fi is 
0 ( 1 ). 


1.1 Related work 

Non-convex case In this case we show how to compute efficiently distance transforms (or 
min-convolutions) for continuous piecewise-linear functions. To our knowledge, the previous 
algorithmic work considered only distance transforms for discretized functions [Si- 


Convex case on general graphs Hochbaum showed [22j that problem 0 on general graphs 
can be solved in polynomial time for several choices of convex unary functions. The method 
works by reducing problem Q to a sequence of optimization problems with binary variables 
whose unary terms depend linearly on a parameter A. This reduction has also appeared later 


in [301 H21 El- 

Specializing Hochbaum’s method to trees yields the following complexities: (i) 0{n 2 ) for 
problems with quadratic unaries, assuming that the values of A are chosen as in HU; (ii) 0(n\ogn) 
for piecewise-linear unaries with 0(1) breakpoints, assuming that the values of A are computed by 
a linear-time median algorithm (as discussed in Sec. 4.3 for chains). Instead of using a linear-time 
median algorithm, it is also possible to sort all breakpoints in O(nlogn) time in a preprocessing 
step. 


Convex case on chains The convex case on a chain (or its continuous-domain version) has 
been addressed in mmmmmmmmm- In particular, it has been shown that the 
problem with quadratic unaries fi{z) = ^(z — Ci) 2 can be solved in Oin) time by the taut string 
algorithm PESOS] and by the method of Johnson [23j. Condat [TO] presented an 0{n 2 ) algorithm, 
which however empirically outperformed the method in mm according to the tests in [TO] , 
In [2], the authors proposed an elegant derivation of the method of Condat m starting from 
the tau string algorithm da, which in turn also allows to use weighted total variation. Our 
0(n) method for this case can be viewed as a generalization to weighted total variation and an 
alternative implementation of Johnson’s algorithm that requires less memory. 

For the problem with piecewise-linear unaries fi(z) = \z — c,;| the best known complexity 
was O(nlogn), which is achieved either by Hochbaum’s method (as discussed earlier), or by the 
method in [Tl] . We improve this to O(nloglogn). 

We generally follow the derivation in [231 , which is quite different from the one in mmm- 
We extend this derivation to non-smooth functions and to general trees. 
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1.2 Applications 

In Sec. [5] we show applications to continuous valued total variation based 2D image processing 
and computer vision. For this we adopt a Lagrangian approach to decompose the 2D problems 
into a set of ID problems. In the convex case, we solve the resulting saddle point problems 
using accelerated primal-dual algorithms, outperforming the state-of-the-art by about one order 
of magnitude. In the non-convex case we solve a non-convex saddle-point problem by again 
applying a primal-dual algorithm, which however has no theoretical guarantee to converge. The 
resulting algorithms are efficient, easy to parallelize and require memory only in the order of the 
image pixels. 


2 Preliminaries 

We assume that all edges in the tree ( V ., E) are oriented toward the root r £ V. Thus, every node 
i £ V — {r} has exactly one parent edge (i,j) £ E. When specializing to a chain, we assume 
that V = [n] n} and E = {(*,* + 1) | i £ [n — 1]}, with n being the root. 


Min-convolution For functions h, g : ffi. —> R we define their min-convolution h(& g via 

{h®g)(y)=mm[h(x)+g(y-x)] My (3) 

X 

With such operation we can associate a mapping ir that returns 7 r(y) £ arg min x \h(x) +g(y — a;)] 
for any given y\ we say that such mapping 7r corresponds to the min-convolution operation aboveFl 
Operation ([ 3 ]) is known under several other names, e.g. the maximum transform [5], the slope 
transform [25], and the distance transform |16j . Note that if g = f tJ and function h. is defined on 
a grid then there exist efficient algorithms for computing h ® g [16] . For the non-convex case we 
will need to extend these algorithms to piecewise-linear functions h : R. —► R. 


Dynamic Programming It is well-known that function on a tree can be minimized using 
a dynamic programming (DP) procedure (see e.g. [17] 1. If the tree is a chain, then it is equivalent 
to the Viterbi algorithm. Let us review this procedure. 

DP works with messages Mij : R —> R. for (*,_))£ E and M,; : R — > R. for * £ V. These 
messages are computed in the forward pass by going through edges (i. j) £ E in the order 
starting from leaves toward the root and setting 



= fi(Xi) + ^2 

(4a) 


( k,i)eE 


Mij(xj) 

= min Mi(xi) + fij(xj - xf) 

(4b) 


for all Xi and Xj. (Due to the chosen order of updates, the right-hand side is always defined). 
Note that (4b I is a min-convolution operation: M,j = M* <g> f t j. While computing it, we also 
need to determine a corresponding mapping 7T. t j (it will be used in the backward pass). 

After computing all messages we first find x r that minimizes M r (x r ). and then go through 
edges (i, j) £ E in the backward order and set Xi = nij(xj). For completeness, let us show the 
correctness of this procedure. 


this paper we will apply operation § only in cases in which it is defined, i.e. the minimum exists and is 
attained at some point x E R (for each y G I). In particular, both functions h and g will be piecewise-linear or 
piecewise-quadratic. 
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Proposition 1. The procedure above returns a minimizer of /(•)• 

Proof. For a node i £ V and an edge (i,j) £ E define “partial costs” /«(•) and f*ij(-) via 


f*i(x) = ^2fp(x p )+ ^2 fpq(x q -Xp) \/x £ R" (5a) 

pev i {p,q)&Ei 

f*ij(x ) = f*i(x) + fij{xj - Xi) \/x £ R" (5b) 

where (V), Ej ) is the subtree of ( V,E) rooted at i. In particular, for the root i = r we have 
(' V r ,E r ) = (V,E) and /* r ( x) = f(x). Now define functions 

Mi(z) = min f*i{x) Vz € M (6a) 

X\Xi—Z 

Mij(z) = min f*ij{x) V^eR (6b) 

x:x-j=z 


f*i{x) = fi(x) + i)eE f*ki ( x )i while (4b) follows from (5b). Therefore, update equations Q 


compute quantities defined in ([6]). This also means that values Mjfz) and Mj : j{z) are finite for 
all ^ £ K. (due to the assumption made in the beginning of Sec. [I]). 

Now let us consider the backward pass. Rename nodes in V as {1,..., n} (with n being the 
root) so that the procedure first assigns x n , then x„_i, cc n _ 2 , and so on until X\. Note that 
Vi C for any i £ V. Define set 


Xi = {y £ M" | y 2 = xj Vj S {*,..., n}} 

Next, we will prove that for any i £ V we have min^g^. f(y) < min y6 R^ f(y) (for i = 1 this 
will mean that /( x) < min^ggn f(y)). We use induction on n in the decreasing order. We have 
x n £ argmin. M n (z) with M n {z) = min a;:Xi= 2 f(x)\ this gives the base case i = n. Now suppose 
the claim holds for i + 1 < n\ let us show it for i. Let (i,j) £ E be the parent edge for i. with 
j £ {i + 1, • • ■, n}. We have Xi = irij(xj), or equivalently 


i £ argmin Mi{xi) + fij(xj - xf) = argmin 


min f*i(w) 

w:Wi=Xi 


T fij {'Xj Xi) 


(7) 


Pick x* £ argmin x , eM „. x » =iJ , i f*i(x*) and y £ argmin ygA - !+i f(y). Since function f*i(x*) depends 
only on variables x* for j £ Vi, other variables of x* can be chosen arbitrarily. We can thus 
assume w.l.o.g. that a;* = yj for all j £ V — V). This implies that 

/(**) - f(y) = + hj(xj - Xi)\ - [f*i{y) + fij(xj - yt)] 

(all other terms of / cancel each other, and we have x* = yj = Xj and x* = Xi). We also have 


+ fij(xj - Xi) < l min f ti (w) + fij(xj - yf) < f ti (y) + fij(xj - yf) 

\w:wi=yi J 

where in the first inequality we used 0- We obtain that f(x*) < f(y) = mm yeXi+1 f(y) < 
min y gRn f(y), where the last inequality is by the induction hypothesis. It can be checked that 
x* £ Xi, which gives the claim for i. □ 


In the non-convex case we will use the DP algorithm directly. In the convex case all messages 
Mi,Mij will be convex functions, and it will be more convenient to work with their derivatives 
fhi,niij, or more generally their subgradients if the messages are not differentiable. (This will 
simplify considerably descriptions of algorithms). 

The main computational question is how to manipulate with messages (or their subgradients). 
Storing their values explicitly for all possible arguments is infeasible, so we need to use some 
implicit representation. Specific cases are discussed below. 
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3 Non-convex case 


In this section we assume that unary terms /* are continuous piecewise-linear functions with 
0(1) breakpoints, and terms f l3 are given by ([2]). As we will see, in this case all messages Mi 
and Mij will also be continuous piecewise-linear functions. We will store such function as a 
sequence (so, Ai, si,..., s t _i, X t , s t ) where t is the number of breakpoints, X p is the X-coordinate 
of the p- th breakpoint with Ai < ... < At, and s p is the slope of the p- th segment. Note that 
this sequence allows to reconstruct the message only up to an additive constant, but this will be 
sufficient for our purposes. 

The two lemmas below address updates (4a) and (4b), respectively; their proofs are given in 
Sec. [3JJand[T2i 


Lemma 1. If messages M^i for ( k , i ) G E are piecewise-linear functions with tk breakpoints then 
Mi = fi + Xqfc j) £ E Mj~i is also a piecewise-linear function with at most t = 0(1) + J2(k i)eE ^ 
breakpoints. It can be computed in Oft\og(di + 1)) time where di = |{/c | fk,i) G E}\ is the 
in-degree of node i. 

Lemma 2. If message M 3 is a piecewise-linear function with t breakpoints then = Mi £g> 
is also a piecewise-linear function with at most 2t + 1 breakpoints. This min-convolution and the 
corresponding mapping tt tj can be computed in Oft + 1) time; the latter is represented by a data 
structure of size Oft + 1) that can be queried in Oft + 1) time. 

Furthermore, if Cij = +oo then M\j has at most t breakpoints. 


Corollary 1. If Cij = +oo for all ( i,j ) G E then function (JTJ) can be minimized using 0(n 2 ) 
time and space. 


Proof. The lemmas above imply that messages Mi and have at most 0(|li|) breakpoints, 
where V 3 C V is the set of nodes in the subtree of (V,E) rooted at i. Note that |1A;| < n. The 
overall time taken by updates (4b) is ^ eE Ofn) = Ofn 2 ). The same is true for updates (4a), 
since 

nlog(di + 1) < const ■ n di = const ■ nfn — 1) 


i£V 


iev 


Finally, we need Xq; j)^E 0 ( n ) = Ofn 2 ) space to store mappings 7r^, and 0(n 2 ) time to query 
them in the backward pass. □ 


If values are finite then the lemmas give only an exponential bound 2°^ Vi ^ on the number 
of breakpoints in messages Mi and Mjj. Our experiments, however, indicate that in practice the 
number of breakpoints stays manageable (see Sec. [5| . 


3.1 Proof of Lemma [Q 

We only discuss the complexity of computing the sequence representing Mp, the rest of the 
statement is straightforward. We need to compute a sum of d\ + 1 piecewise-linear functions 
where the breakpoints of each function are given in the non-decreasing order. This essentially 
amounts to sorting all input breakpoints; clearly, during sorting we can also compute the slopes 
between adjacent breakpoints. Sorting di + 1 sorted lists with a total of t points can be done in 
0(t\og(di + 1)) time [20 ‘. 
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(a) 



Figure 1: (a): Input function h(z). (b,c,d): Min-convolution (h®g)(z) for different functions 

9 - 


3.2 Proof of Lemma [2] 

We will use the following fact, whose proof is given in Appendix |A| 

Proposition 2. Suppose that g,g l ,... ,g m are functions with g x ( 0) = ... = g m ( 0) = 0 satisfying 

g(z)=mm{g 1 (z),...,g m (z)} Vzel (8) 

and also suppose that g satisfies a triangle inequality: 

g(zi + z 2 ) < g{zi) + g{z 2 ) Vz\,z 2 ( 9 ) 

For a given function h : R —> M define functions hP, ..., h m via h° = h and h k = h k ~ 1 ® g k . 
Then h m = h® g. Furthermore, if mappings n k correspond to min-convolutions h k ~ l g k then 
mapping n = n 1 o ... o 7 r m corresponds to min-convolution h® g. 

Consider function fij(z) = min{u>|z|, C} with w > 0. We can assume w.l.o.g. that C > 0 
(otherwise computing h ® fij = h + C is trivial). It can be checked that f tl satisfies the triangle 
inequality. We will represent it as the minimum of the following three functions: 


+00 if z < 0 , , ( w\z\ if 2 

., . n fiA z ) = \ , ^ 

wz it z > 0 I +00 if 2 > 


mz) 


C ifz^O 
0 if 2 = 0 


If C = +00 then we can take just the first two functions. By Proposition [2j it suffices to show 
how to compute h® g and the corresponding mapping 7r for a given piecewise-lincar function h 
and function g £ {/./ 7 . ff ? , ff-j}, assuming the result in each case is also piecewise-linear. These 
transformations can then be applied consecutively to give 

Mij = Mi® fij = (( Mi ® fP) ® ffj) ® ffj 

We assume below that h is represented by the sequence (so, Ai, Si,..., St- 1 , At, St.) with t > 0. 
We also assume that h bounded from below. (This must be true for all messages, otherwise f{x) 
would be unbounded from below). 
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Computing h g /A This operation is illustrated in Fig. |TJb) , and a formal procedure for 
computing it is given in Algorithm |TJ Upon termination cr is the sequence representing h g f} 3 
and II is a set of intervals defining mapping 7r as follows: 


*(y) 


A if exists [A , A + ] G II s.t. y G [A , A + ] 
y otherwise 


Algorithm 1 Computing min-convolution h g /A 

1: set p = 1, a = (so), II = 0 

2: while p < t do 

3: append (A p , min{s p , tc}) to cr 

4: if s p > w then 

5: define linear function h(z) = w(z — A p ) + h{ A p ) passing through (A p , h( X p )) 

6: find smallest q£ [p + 2, t + 1] s.t. h(X q ) < h(X q ), assuming that At+i is sufficiently large; 

if there is no such q then add [A p , +oo] to II and terminate 
7: compute A G [A q _i,A g ] with h( A) = h( A) 

8: append (A, s g _i) to cr, add [A p , A] to II, set p := q 

9: else 

10: set p := p + 1 

11: end if 

12: end while 


To verify correctness of Algorithm |TJ first note that in line 6 we have h( A p +i) > h( A p +i) since 
s p > w. Therefore, in line 7 we are guaranteed to have h( A 9 _i) > h( A g _i) and h{X q ) < h(X q ), 
and so value A in line 7 indeed exists. It can also be checked that in line 3 we always have 
s q - 1 < w. for q = 1 this holds since so < 0 due to the boundedness of h. [^] This implies 
correctness of the procedure. 

It can be seen that the number of breakpoints cannot increase: if a new breakpoint A is 
introduced in line 8, then at least one old breakpoint is removed, namely A g _i. 

Computing h <8> f? :) This case can be reduced to the previous one as follows: h ® /A = 
(/i rev (g) /A) rev where <p rev for a function ip is defined via ip Tev (z) = (To transform tp to 

(p rev , we need to reverse the sequence for tp and multiply all components by —1). 

To reduce the number of passes through the sequence, one can modify Algorithm |T| so that it 
immediately produces the sequence for h ® rev /A = (h <g> and then apply it twice noting 

that (h <g> /A) ® ffj = (h g) rev /A) (g) rev /A. 


Computing h g) /A Assume that 0 < C < +oo. Function h is defined only up to an additive 
constant, so we can set e.g. h{ Ai) = 0. First, we compute p G argmin p6 m h(X p ) and set C' = 
h(X p ) + C. We now have (h g ffj){z) = min {h{z),C'} (Fig. Jl|c)). Let II be the set of intervals 
whose union equals {z \ h(z) > C'}, then the mapping n is given by 


*(y) 


X q if y G [A , A + ] for some [A , A + ] G II 
y otherwise 


2 If we didn’t have the assumption that fix) is bounded, then we 
sq > w then return a = (w) and II = [—cxd, +oo]. 


could modify Algorithm [T] as follows: 


if 
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The number of breakpoints in h ® ff ;j is at most 2t + 1 since for each of the t + 1 linear segments 
of h at most one new breakpoint is introduced. 

Clearly, the sequence for h <g> ffj and the set II can be computed in Oft + 1) time; we omit 
details. 

3.3 Extensions 

To conclude the discussion of the non-convex case, we mention two possible extensions: 

(i) Allow pairwise terms fij to be piecewise-linear functions that are non-increasing concave on 
(—oo,0] and non-decreasing concave on [0, +oo). (A truncated TV regularizer is a special 
case of that.) 

(ii) Allow unary terms /,; to be piecewise-quadratic. 

We claim that in both cases messages can be computed exactly (either as piecewise-linear or 
piecewise-quadratic), although the number of breakpoints could grow exponentially. Below we 
give a proof sketch only for the first extension, in which the messages stay piecewise-linear. 

Adding a constant to /,y does not change the problem, so we can assume w.l.o.g. that fij(O) = 
0. If fij has m — 1 breakpoints then we can represent it as a minimum of functions /A,..., /™ 
where /*) satisfies /*)(0) = 0 and is either (a) linear on (—oo,0) and +oo on (0,+oo), or (b) vice 
versa: +00 on (— 00 ,0) and linear on (0,+oo). It can be checked that fij satisfies the triangle 
inequality, so we can apply Proposition [2j It is thus sufficient to describe how to compute 
min-convolution h (g) g for each g £ {/f,..., /”)}. 

Assume that g is +00 on (— 00 ,0) and linear on (0,+ 00 ) (the other case is symmetric). We 
have g(z) = az + C for 2 > 0, where a, C > 0 are some constants. For a function p define 
function tp a via p a (z) = ip(z) — az. (Note that adding a linear term to a ip can be done by 
traversing the sequence representing p and increasing all slopes by a constant.) It can be checked 
that h (g> g = ( h a ® g a )~ a , so it suffices to consider the min-convolution h a ® g a . Such min- 
convolution is illustrated in Fig. [T](d) . It is not difficult to see that it adds at most t breakpoints 
and can be implemented in Oft + 1) time, where t is the number of breakpoints of h. We leave 
details to the reader. 


4 Convex case 


We now assume that all functions ft and fij are convex; as we will see, in this case function 
f(x) can be minimized much more efficiently. For convenience of notation we will assume that 
functions fij are given by 


fij ( 2 ) 


vj-- ■ z if 2 < 0 
wf] ■ z if 2 > 0 


with Wij <w±. ^ _ 

It can be checked that if function Mi is convex then so is My = Mj (§) f,j (see Fig. [ 2 ]), and 
therefore by induction all messages Mj and Mij will be convex. It will be more convenient to 
work with their derivatives fhi(z) = M[(z) and rriij(z) = M[^{z). If Mi is not differentiable at z 
then we let fhi(z) to be an arbitrary subgradient of Mi at z (and similarly for M.^). Note that 
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Figure 2: Min-convolution M y = Mi g) /y. If Mi is convex then so is Mij. Function My 
coincides with Mi on [Ay, Ay] and is linear on (— oo, A“] and [Ay,+oo) with the slopes w~j and 
wfj respectively. 


functions my rriij are non-decreasing and satisfy 

Mi(z) = const + f rhi{\)d\ Mz £ R. 

Jo 

Mij(z) = const + / my(A)dA Vz £ R 

Jo 


Let gi{z) be a subgradient of fi at z; function gi is then non-decreasing (and not necessarily 
continuous). An algorithm that works with subgradients is given below (Algorithm [2]) . In this 
algorithm we denote clipj a6 ](z) = min{max{z, a}, 6}} (i.e. the projection of z to [a, b\). The 


updates in lines 2 and 3 correspond to eq. (4a I and (4b) respectively, and the values Ay, A A in 


line 4 describe the mapping 7ry corresponding to the min-convolution My = Mj g /y. 


Algorithm 2 DP algorithm for the convex case. 

0: add new node f and edge (r, r) to (V, E ), make f the new root; set w~~ = w+- = 0 
// forward pass; steps 2 and 3 should be done for all z £ ffi. 

1: for each edge (i, j) £ E do in the order from the leaves toward the root 
2 : set fhi(z) = gi(z) + m ki (z) 

(. k,i)eE 

3: set rriij(z) = clip^-^^( to^z)) 

4: find interval [Ay, Ay] such that m*(Ay) = w~j and rn,(Ay) = Wy 

5: end for 

// backward pass 
6: set x r £ [AT, A A] 

7: for each edge (i, j) £ E — {(r,r)} do in the order from the root toward the leaves 
8: set Xi = c 1 ipr. — \+l(^i) 

I -ij 

9: end for 


If in line 4 there is no Ay with mj( Ay) = w- then we use a natural rule (considering that 
function fhi is non-decreasing), namely set 

Ay £ [ sup{z | fhi{z) < w"-}, inf{z | fhi(z) > «)“•} ] (10) 

Note that the bounds in (|To| ) (and thus A“) can be infinite. A similar rule is used for Ay. 

Note that Algorithm [2] is equivalent to the one given by Johnson [23], except that the latter 
has been formulated for smooth functions fi and a chain graph (V,E). 
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Figure 3: TV on a chain with quadratic unaries; ( k,i,j ) = Message rriij is obtained 

from niki via two transformations: fhi(z) = rriki(z) + (qA — 6* and rriij(z) = clip, - +,(m,;(z)). 

In transformation (b) two new breakpoints are added (shown as thick black circles), and some 
breakpoints may be removed (shown as empty circles). 

4.1 Quadratic unaries on a chain: 0(n ) algorithm 

Let us assume that graph (V,E) is a chain with nodes V = [n], where n is the root. We also 
assume that the unary terms are strictly convex quadratic functions: fi(xi) = \a,ix\ — biXi with 
dj > 0. We thus have gi(xi) = aiX — hi. As shown in [23] - Algorithm [2] can be implemented in 
0(n) time. In this section we describe a more memory-efficient version: we use 2 floating points 
per breakpoint, while |23j used 3 floating points plus a Boolean flag. 

It can be checked by induction that all messages rriij and m. t are piecewise-linear non¬ 
decreasing functions, and the latter are strictly increasing (see Fig. [3|. We will maintain the 
current message (which can be either fhi or rriij) with t breakpoints and t + 1 segments as a 
sequence (so, Ai, si,..., st-i, At, s*). Here A p is the X-coordinate of the p-th breakpoint with 
Ai < ... < A t , and s p represents in a certain way the slope of the p-th segment. 

If s p were the true slope of the corresponding segment then in transformation (a) in Fig. [3] 
we would need to go through the entire sequence and increase the slopes by eq. To avoid this 
expensive operation, we use an implicit representation: the true slope of the p-th segment in 
messages fhi and rriij is given by s p + di, where di = a k- Thus, the transformation (a) in 

Fig. [3] is performed automatically; we just need to compute a* = a*_i + a^. 

Sequence (so, Ai, si,..., St- 1 , At, St ) will be stored contiguously in an array of size 4n + 0(1) 
together with indexes pointing to the first and the last elements. In the beginning the sequence 
is placed in the middle of this array, so that there is a sufficient space for growing both to the 
left and to the right. 

Forward pass We have described the data structure used for storing the messages; let us 
now discuss how these data structures are updated during transformation (b) in Fig. [3] for edge 
( i,j ) £ E. We assume that (so, Ai, s i,..., St- 1 , At, St), t > 2 is the current sequence for message 
fhi. 

During the update some breakpoints at the two ends of the sequence may be removed (they 
are shown as empty circles in Fig. 3b, and two new breakpoints A,” and Aj|j are appended at the 
ends. Let di > 0 be the number oi removed breakpoints. We show below that the update can 
be performed in 0{di + 1) time. This will imply that the forward pass takes 0{n) time. Indeed, 
we can use an amortized analysis; when adding a new breakpoint, we give it one unit of credit, 
and use this credit when the breakpoint is removed. The total number of added breakpoints is 
2 n, which gives the claim. 

In the remainder of this section we describe details of the update for edge (i, j). To determine 
where A,“ is to be inserted, we first need to find the largest index l with fhi(Xi) < w~y, if there is 
no such index then let £ = 0. This can be done by traversing the breakpoints from left to right, 
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stopping when index £ is found. Note that values m,;(Afc) are not stored explicitly, but can be 
recursively computed as follows (here k = i — 1): 

fhi( Ai) = + a, ; Ai - bi 

m.j(Ap-i-i) = 77ij(Ap) + (sp 4- di) • (A p+ i A p ) 

Once £ is computed, we can find the value A“- for which Wj(A“) = in 0(1) time. We then 
change the sequence to (—5$, A“-, s^+i,..., s t _i, A t , s t ). All these operations take 0(£ + 1) time. 

Inserting breakpoint A A is performed in a similar way. We traverse the sequence from right 
to left and compute the smallest index r > £ + 1 with rhi( A r ) > w t; if there is no such r then 
let r = t + 1. We use recursions 


fhi{a t ) = w^ + aiXt-bi 

?Tij(Ap) = 77T.i(Ap_)_i) (Sp -(- cq) • (Ap+i Ap) 

We then set At. so that uij(At) = wt and change the sequence to (— cq, A“ , s^+i,..., s r _i, At, —a,). 

Backward pass In the backward pass we need to update Xi = clip,, - A +,(a:j) for all edges 

(i, j) £ E in the backward order. These updates can be performed in 0(n) time if we record 
values A~- and Aj during the forward pass. 

The overall memory requirements of the algorithm (excluding input parameters) is thus 6n + 
0(1) floating point numbers: 4 n + 0(1) for storing the sequence (so, Ai, Si,..., St- 1 , At, St) and 
2n+ 0(1) for storing breakpoints A“-, AA for all edges (i,j). Note, breakpoints A,“ can be stored 
in the same array used for returning the solution x. 


4.2 Piecewise-quadratic unaries: 0{n log n) algorithm 

To simplify the presentation, we will first consider the case of piecewise-linear unaries on a chain, 
and then discuss extensions to trees and to piecewise-quadratic unaries. 


Piecewise-linear unaries on a chain In this case terms g L = f[ and the messages rn^ and 
fhi will be piecewise-constant non-decreasing functions; Fig. [4] illustrates how they are updated. 
The current message h : R —> K. (which is either nriij or fhi ) will be represented by the following 
data: (i) values h~ = h{— oo) = min- h{z) and h + = h(+ oo) = max z h(z); (ii) a multiset S of 
breakpoints of the form a = (A CT , S a ) where A^ is its A'-coordinate and 5 a is the increment in the 
value of h at this breakpoint. We thus have 

h(z) = h~ + ^2 S<j = h + - ^2 3 <j 

aGS'.XtrKz aGS:X CT >z 


assuming that g is left-continuous. Points a £ S will be stored in a double ended priority queue 
which allows the following operations: Insert (which inserts a single a point into S), FindMin 
(which finds a point a £ S with the minimum value of Ao-), FindMax, RemoveMin, and RemoveMax. 
In our implementation we use a Min-Max Heap |1] which takes 0(1) for FindMin/FindMax and 
0(log n) for Insert/RemoveMin/RemoveMax (assuming that the total number of points is bounded 
by 0(nj). 

Let us discuss how to update this data during message passing. First, consider the update 


fhij = mki + gi- If gi has one breakpoint, i.e. gi(z) 


o, if z <bi 

, then we insert a = 
at if 0 > W 
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Figure 4: TV on a chain with piecewise-linear unaries. 


{b,. af — a^ ) into S and update h := h +a , h + := h + +a + . If g, has more than one breakpoint 
then the procedure is similar. Now consider the update rriij(z) = clip, - w +i(rhi(z)). To clip 

from below, we first need to remove points a £ S with m* (A CT + 0) < w F.. For that we repeatedly 
call FindMin/RemoveMin (updating h~ accordingly) until we get h~ > w~£. As the last step, we 
may need to update the value 8 a for a = FindMin. (The case when S becomes empty should 
be handled separately.) Clipping from above is done in a similar way. During this procedure we 
can also compute values A”- and A+-. 

It remains to discuss the complexity. Insert, RemoveMin and RemoveMax operations are 
called at most 0(n) times (the number of points is O(n)), so they take 0{n log n) time. When 
computing rriij, the number of calls to FindMin exceeds that to RemoveMin by at most 1 (and 
similarly for “Max”), so FindMin/FindMax are called 0(n) times and take 0(n) time. 

Extension to trees If graph (' V ,, E) is a tree, we will use the same data structure for each 
branch as we go from the leaves toward the root. The difference from the previous case is that 
now during the update fhi = gi + Y^(k i)eE Wfe * we nee( l to compute the union of multisets 
corresponding to messages nriki- Thus, we need a version of double ended priority queue that 
allows merging two queues. One possibility is to use two Fibonacci Heaps [H] (one for the 
min and one for the max operations) which allow merging in 0(1). The total number of merge 
operations is O(n), so the overall complexity is still O(nlogrc). 

Extension to piecewise-quadratic unaries In this case terms (ji (and thus messages my, 
fhi) will be non-decreasing piecewise-linear functions (possibly, discontinuous at breakpoints). 
We will use the same approach as before, only now each segment will be specified via two 
numbers (a, b) (that define function az + 6), not one. Thus, h~ is now a vector h~ = (a~,b~) 
with h(z ) = a~z + b~ for z —¥ — oo, and similarly for h + . A breakpoint a is now given by a triple 
a = (A ct , 5a ai 8b a ) where the last two values describe the change in the parameters of the linear 
function at this breakpoint. One difference in the algorithm is that now new breakpoints may 
appear during the update rriij(z) = clip, - w + ^(fhi(z)), similar to the case in Sec.|4.1| However, 
the number of such breakpoints is at most 2, and therefore the complexity is still 0{n logn). 

4.3 Piecewise-linear unaries on a chain: 0[n log logn) algorithm 

Here we assume again that (V, E) is a chain with V = [n] and functions /* are piecewise-linear 
with 0(1) breakpoints. Thus, functions g t are non-decreasing piecewise-constant; we will assume 
that they are left-continuous. It is well-known j26] that any submodular function h{z) has a 
unique lowest minimizer; we will denote it as argminj h{z) £ argmin. h(z). We will show how 
to compute x = argmin” f{x). To simplify the presentation, we will assume that it is bounded, 
i.e. x £ R n . 
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Let A be the multiset of breakpoint values A present in unary terms, so that |A| = 0(n). It 
can be checked that there exists an optimal solution x £ A n (e.g. by observing that the algorithm 
given in the previous section never introduces new breakpoint values). 

Note that in all previous algorithms we explicitly computed values A“ and Ad.. The proposi¬ 
tion below shows that we cannot afford to do this anymore if we want to improve on 0(n log n). 
(Its proof is given in Appendix [B| and is based on a reduction to the sorting problem.) 

Proposition 3. Any comparison-based algorithm that computes values A- for all ( i,j ) £ E 
requires fl(n\ogn) comparisons in the worst case. 

To motivate our approach, we will first describe an alternative 0(n log n) algorithm that 
avoids computing values A“- and Ah. This algorithm can be viewed as a specialization of 
Hochbaum’s algorithm [22] to chain graphs. We will then show how to modify it to get 0[n log log n) 
complexity. 

The idea is to reduce minimization problem (|T]) to a sequence of problems of the following 
form (for some fixed values of parameter A £ R): 

min g x (y), g x (y) = V gi{X)y l + V fij(yj-yi) 

Such reduction to a parametric maxflow problem , due to Hochbaum [22], is well-known for the 
TV problem on general graphs; it also appeared later in EHUniE]- It is based on the following 
result. 

Theorem 1 ([22]). For a fixed A £ K, lety = argmin“ g ^ 0 g\{y). Denote Vq = {i £ V\yi = 0} 
and V\ = {i £ V \ yi = 1}. Let x = argmin“ gR „ f[x); for brevity write x = (x 0 ,® 1 ) where x k is 
the subvector of x corresponding to subset 14. Then x° < A and x 1 > A component-wise, where 
A denotes vector (A,..., A) of the appropriate dimension. 

The theorem suggests a divide-and-conquer algorithm for computing x = argmuT}., /(x): 

• Pick some “pivot” value A £ [a, b] and compute y = argmin“ g ^- 0 ±y„g X (y)', this partitions 
the nodes into two subsets Vo and V\ of sizes no and n\ respectively. 

• Compute recursively x° = arg min“ 0g r A , no /(x°, A) and x 1 = argmin“ lg r A>b , ni /(AjX 1 ) (or 
solve these problems explicitly, if e.g. their size is small enough)^] These two subproblems 
are defined on induced subgraphs (Vo,E\Vo\) and (Vi,.E[Vi]), respectively. Each of them 
is a union of chains; each chain is solved independently via a recursive call. 

Let us apply this strategy to our problem. First, observe that for a fixed A function g x (y) can 
be minimized in 0{n) time. Indeed, we can use a dynamic programming approach described in 
Sec. [2j except that instead of continuous-valued variables we now have {0, l}-valued variables. 
Let fhj(yj; A) and mij(yj]X) be the corresponding messages where j £ V, ( i,j ) £ E and yj £ 
{0,1}. To extract an optimal solution, it suffices to know the differences mj(l;A) — 77^(0; A) 
and m,j (1; A) — m.y(0;A). Denote these differences as rhi{ A) and m i: j (A) respectively. It can be 
checked that the update equations for these values are given by lines 2 and 3 of Algorithm [2] for 
z = X, and each of these updates takes 0(1) time. 

Second, we note that in the subproblem min^ogja ^n,, f(x°, A) we can modify the unary terms 
for nodes i £ V o by removing all breakpoints that are greater than or equal to A; this will not 

“Note that for any fixed x 1 > A we have /(a: 0 ,! 1 ) = A) + const for x° < A. This justifies replacing 

the objective function f(x°,x 1 ) with f(x°, A) in the first subproblem. A similar argument holds for the second 
subproblem. 
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change the problem. Similarly, for the other subproblem we can remove all breakpoints that are 
smaller than or equal to A. 

Finally, we need to discuss how to select value A. It is natural to take A as the median of 
values in A, which can be computed in 0(|A|) C 0(n) time [TT] , (If A is empty, then we can 
solve the problem explicitly by taking x = a.) 

Let Ao and Ai be the multisets of breakpoints present in the first and the second subproblems 
respectively. We have |A 0 | < ^|A| and |Ai| < ||A|, which leads to the following complexity (see 
Appendix [C| . 

Proposition 4. The algorithm above has complexity O(nlogn). 

We now discuss how to modify this approach to get 0(n log log n) complexity. Choose an 
integer to £ 0(logn), and define set U = {km + 1 £ [n] | k £ Z} U {n} of size N = \U\ £ 
0(n/logrc). The nodes in U will be called subsampled nodes. We assume that U = {«i,..., i^} 
with 1 = ii < ... < in = n, and let £ = {(ik,ik+ 1 ) | k £ [N — 1]}. 

The algorithm will have two stages. First, we use a divide-and-conquer strategy above to 
compute an optimal solution Xi for nodes i £U. Once this is done, a full optimal solution can 
be recovered by solving |£| independent subproblems: for each ( i,j ) £ £ we need to minimize 
function f(x) over (a; i+1 ,... ,Xj_ i) with fixed values of aq and Xj. The latter can be done in 
0(m log to) time (since j — i < to), so the complexity of the second stage is 0(Nm log to) = 
0(?rlog logn). 

We thus concentrate on the first stage. Its main computational subroutine is to compute 
an optimal solution y = argmin“ g r 0 g\(y) for a given A at nodes i £ U. As before, we will 
use dynamic programming. Passing messages in a naive way would take O(Nm) time, which is 
too slow for our purposes. To speed it up, we will “contract” each edge (*, j) £ £ into a data 
structure that will allow passing a message from i to j in O(logTO) time instead of O(to), so that 
the subroutine will take 0{N login) time. The contraction operation is described below; we then 
give a formal description of the first stage. 

Contraction Consider indices i,j £ V with i < j. Our goal to solve efficiently the following 
problem for a given A £ K: given the value to.;(A), compute message fhj{ A). Let us denote the 
corresponding transformation by TA : R —► R, so that fhj{ A) = A)). We will show that 

mapping TA can be described compactly by 3 numbers. 

Proposition 5. For a triplet r = (<5, a, b) £ R 3 with a < b define function (r) : R —► R via 

( T )(v) =tf + clip [0i6] (u) Vu £ R 

(a) If (i,j) £ E then PA = . (b) There holds 

( (S' + a' — b, b , b) if b < min I 

(5 1 , a', b') o (5, a, b) = < (S' + 6, clip 7 (a), clip J (6)) if [a, b] D I 0 

I (S' + b' — a, a, a) if a > max I 

where I = [a' — 6, b' — 5]. 2 

4 Note that in the first and third cases the composition is a constant mapping R —► M, and can be described 
by many possible triplets. We chose parameters that will ensure the correctness of the backward pass (namely, of 
eq. given later). 
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A proof of these facts is mechanical, and is omitted. Using induction on j — i , we conclude 
that TX = (5, a , b) for some constants S , a, b (that may depend on A, i and j). 

We showed that for a fixed A transformation TX can be stored using 0(1) space and queried 
in 0(1) time. Let us now discuss how to store these transformations for all A € R; we denote 
the corresponding mapping K x K —>• R. by T t j. Let Ai ,..., A* be the breakpoint values in 
the non-decreasing order present in the unary terms gk for k £ [i + 1 ,j], with t = 0(j — i). 
It follows from the previous discussion that mapping Tij can be represented by a sequence 
(t 0 , Ai, ri,..., r t _i, A t , T t ) where r p = (S p , a p , b p ). If A p < A < A p+ i then T x [v ) = (t p )(v), where 
we assume that Ao = — oo and At+i = +oo. 

Given sequences for T\j and Tjk with t and t! breakpoints respectively, we can compute the 
sequence for TA with t+t 1 breakpoints in 0(t+t' + 1) time by traversing the input sequences as in 
the “merge” operation of the MergeSort algorithm nn, and using Proposition [5jb) . Therefore, 
the sequence for Tjj with (i, j) £ £ can be computed in O(mlogm) time; the complexity analysis 
is the same as in the MergeSort algorithm. The overall time for computing the sequences for all 
(i,j) £ £ is O(Nmlogm) = 0(nloglog?r). 

Given such sequence, passing a message from i to j (i.e. computing fhj(X) = TX(fhi(X))) can 
be done in O(logm) time: first, we use a binary search to locate index p with X p < X < A p +i, 
and then return (r p )(mi(A)). We also need to discuss how to perform a backward pass, i.e. how 
to compute the optimal lowest label Xi if we know message rhi{ A) and the optimal lowest label 
Xj £ {0,1}. Denoting r p = (6, a, b), we can set 

{ 1 if fhi(X) < a 

Xj if fhi(X) £ [a, 6) (11) 

0 if fhi(X) > b 

The correctness of this rule can be verified by induction (assuming that the parameters are 
computed as in Proposition [5]); we leave it to the reader. 

Divide-and-conquer algorithm We are now ready to give a formal description of the first 
stage. It will be convenient to append two extra nodes 0 and n + 1 at the ends of the chain 
with zero unary terms. We also add edges (0,1) and (n, n + 1) with zero weight (and compute 
the sequences for mappings Toi and T n . n +i)- Clearly, this transformation does not change the 
problem. For a subsampled node * let i~ = i — m be its left subsampled neighbor, or = 0 if * 
is the first subsampled node (i.e. if i = 1). Similarly, let i + be the right subsampled neighbor of 
i (with i + = n + 1 for i = n). 

We will define a recursive procedure Solve(W, £ , a, b, £ + ). Here U is a non-empty set of 
consecutive subsampled nodes and £ is the set of edges connecting adjacent nodes in U, containing 
additionally edges (*“,*) and ( j,j + ) where (i,j) = (minW, max£/). Note that |£| = \U\ + 1. Each 
edge (i. j) £ £ has a pointer to the sequence representing mapping T^. The following invariants 
will hold: 

(a) Minimizer x = argmin” f{x) satisfies 

(i) < a (if t- = 0) or aq- > b (if =1), where i = min U; 

(ii) Xk £ [a, b) for all k £U\ 

(iii) Xj+ < a (if 1+ = 0) or Xj+ > b (if £ + = 1), where j = maxW. 

(b) All breakpoints present in 1\j for ( i,j ) £ £ belong to (a, b). 

The output of this procedure is a minimizer x = argmin” f(x) sampled at nodes i £lA : with 
Xi £ [a, b). In the beginning we would call Solve(W, £ , — oo, +oo, 0,0). 
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Our first task is to pick the pivot value A. For edge (i,j) £ £ let A^ be the multiset of 
breakpoint values in the current sequence for X^. If A,; ? is empty for each (i.j) £ £ then we 
return solution Xi = a for all i £ U. If this is not the case then we do the following: 


• For each ( i,j ) £ £ with A tJ ^ 0 compute a median value \j £ Ajj (breaking the ties 
arbitrarily if |Ay| is even). This can be done in 0(1) time since breakpoints in A^ are 
stored in an array in a sorted order. 


• Compute A as a weighted median of the values above where A ij comes with the weight 
| A^ |. This can be done in 0(|£|) time [lT!. (This choice ensures that both of the multisets 
{A' £ A: A' > A} and {A' £ A : A' < A} have at least ||A| elements.) 


The next step is to compute minimizer y = arg min'(y) sampled at nodes i £ U. As 
described earlier, this can be done in 0{\U\\ogm) time. The first message is computed as 
follows: if £_ =0 then fhi-{ A) = +oo, otherwise in*-(A) = — oo (where i = min U). [^] Upon 
reaching node j + for j = maxW we set its optimal label to yj+ = 4 and proceed with the 
backward pass. This procedure partitions U into sets U±,...lA, such that (i) maxW s < minZ/4+i 
for all s, (ii) all nodes i £U S have the same label yi (which we call “the label of U s ” and denote 
as y(U s ) £ {0,1}), and (iii) adjacent sets U s and U s+ 1 have different labels. 


Finally, for each set U s we do the following. Define interval [a s ,& s ] = 


[a, A] if y{U s ) = 0 
JA,6] if y{U s ) = T 

Let £ s = {(*, j) £ £ | {i, j} nW s / 0}. For each edge (i,j) £ £ s modify the sequence for T i: j by 
removing all breakpoints that do not belong to ( a s ,b s ) (since from now on we will need to 
pass messages from i to j only for values A' £ (a,b)). Since breakpoints are stored in a sorted 
order, this takes 0(log?n) per (i,j) £ £ s so 0{\£ | log?n) in total. We then make a recursive call 
Solve(^4, £ s , a s , b Sl y{U s -i), J/(4+i)) where it is assumed that y(U o) = i- and y{U r + 1 ) = (■+■ 
Note that edges (i,j) £ £ connecting adjacent sets U s and U s+ 1 are split into two (one for U s 
and one for U s + 1 ). The same holds for the corresponding mappings T t j. Breakpoints in T l3 that 
are smaller than A are kept in one of the new mappings, and breakpoints that are larger than A 
are kept in the other one. 


Proposition 6. The algorithm above has complexity 0{n log log n). 
A proof is given in Appendix |D| 


Remark 1 Consider the minimization problem on a chain. We say that it has an interac¬ 
tion radius R if the optimal solution at node i £ V depends only on unary terms fj and pairwise 
terms fjk for indices with \j — i\ < R and |k — i\ < R. Note that R > 1. It can be shown that 
the number of breakpoints in all messages stays bounded by some function of R. This means 
that if R is bounded by a constant, the n the complexity of the presented algorithms (except for 
the O(nloglogjr) algorithm in Sec. f.3) is actually linear in n. In particular, complexity 0{n 2 ) 
for the non-convex case in Corollary [I becomes 0{nR), while the complexity 0(n log n) for the 
convex case in Sec. f.2 becomes 0(n\ogR). We do not give a formal proof of these claims, so 
they should be treated as conjectures. 

In practice we often have R n; this happens, in particular, if the regularization term is suf¬ 
ficiently weak relative to the data term. This may explain why in the experiments given in the next 
section many of the algorithms empirically perform better than their worst-case complexities. 


5 This rule for t— = 0 can be justified as follows. Constraint x,- < a means that the problem will not change 
if we add unary term /._ (x) = C\x t - — o' with C > 0 for some a' < a. Such change increases the message 
m ; - (A) by C (since A > a'). Since constant C can be arbitrarily large, the claim follows. The case i— = 1 is 
analogous. 
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5 Application examples 

In this section, we show how the proposed direct algorithms for total variation minimization 
on trees can be used to minimize 2D total variation based model for image processing and 
computer vision. In all examples, we consider the total variation based on the l\ norm of the 
local (2D) image gradients. This allows us to rewrite the models as the sum of one dimensional 
total variation problems, which can be solved by the direct message passing algorithms proposed 
in this paper. The basic idea is to perform an Lagrangian decomposition to transform the 
minimization of a 2D energy to an iterative algorithm minimizing ID energies in each iteration. 

In J2j, a Dykstra-like algorithm |5j has been used to iteratively minimize the TV-t^ model. 
Furthermore, the authors proposed an efficient implementation of the taut-string algorithm, 
which in turn also allows also to tackle the weighted total variation. In case all weights are equal, 
the method is equivalent to Condat’s algorithm. Accelerated Dykstra-like block decomposition 
algorithms based on the FISTA acceleration technique (SI have been recently investigated in 0. 
The authors considered different splittings of the image domain and confirmed in numerical 
experiments that the accelerated algorithms consistently outperform the unaccelerated ones. 

While block-decomposition methods for minimizing the TV -^2 have already been proposed, 
using block-decomposition strategies for solving the TV-£i model or (truncated) TV models 
subject to nonconvex piecewise linear data terms seems to be new. Throughout this section, we 
adopt the framework of structured convex-concave saddle-point problems which can be solved 
by the primal-dual algorithm jTl. 

• In case of total variation with convex quadratic unaries, we show that the proposed method 
scales linearly with the signal length, which is equivalent to the dynamic programming 
approach of Johnson [23], but better than the recently proposed method of Condat DUE] 
whose worst-case complexity is quadratic in the signal length. Furthermore, our algorithm 
can deal with weighted total variation which is not the case for Johnson’s method. Using our 
proposed direct algorithms together with a primal-dual algorithm outperforms competing 
methods on minimizing the TV -^2 model by one order of magnitude. 

• In case of total variation piecewise linear (or quadratic) unaries, we show that the empirical 
complexity also scales linearly with the signal length. We again apply the algorithm within a 
primal-dual algorithm to minimize the TV-£i model for 2D images and obtain an algorithm 
that outperforms the state-of-the-art by one order of magnitude. 

• In case of minimizing the total variation or nonconvex truncated total variation with non¬ 
convex piecewise linear unaries, we show that the empirical worst-case bounds of our al¬ 
gorithms is much better that the theoretical worst-case bounds. We further apply the 
algorithms within a Lagrangian decomposition approach for stereo matching. We point 
out that instead of discretizing the range values of our problems using a discrete set of 
labels, we always rely on continuous valued solutions. Hence, the memory requirement is 
always only in the order of the 2D image size. 

5.1 TV-( 2 image restoration 

First, we provide a comparison of the proposed message passing algorithm for solving total vari¬ 
ation with convex quadratic unaries (TV-f^) on chains to the competing methods of Condat [TU] 
and Johnson 12Jj . The problem is written as 

min Y w ij \x i -x j \ + xYVzi -/i) 2 , (12) 

x£K n ' Z z - 

(i,j)£E i(£V 
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Figure 5: TV -^2 denoising of a noisy ID sine function and a step function of length n containing 
Gaussian noise with a = 0.1. (a) and (b) show in black the input signal for n = 10 3 and in blue the 
TV+ 2 regularized signal using w,_j = n/ 500. (c) and (d) show the CPU times for different signal 
lengths n. Both, the proposed method and Johnson’s method outperform Condat’s method. For 
larger signals, the proposed method appears slightly more efficient than Johnson’s method. In 
case of the step function also Condat’s method seems to be competitive. 


where n is the length of the signal and Wij are the pairwise weights. Fig. [5] provides a comparison 
of the proposed algorithm to the aforementioned competing methods based on regularizing a 
smooth sine-like function and a piecewise constant step function. All three methods have been 
implemented in C++ and executed on a single CPU core. The implementations of Condat and 
Johnson have been provided by the authors. For the smooth sine function, the experiments show 
that while Condat’s method has an almost quadratic worst case time complexity, the method of 
Johnson and the proposed method have a linear time complexity. On the piecewise constant step 
function, Condat’s method appears to perform better but still slower than Johnson’s method and 
the proposed method. 

In the second example, we consider the Rudin-Osher-Fatemi (ROF) [27] model for total vari¬ 
ation image restoration of 2D images. We point out that the ROF problem is a very fundamental 
problem since besides image denoising it can also be used to compute graph cuts [8]. 
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Figure 6: TV -^2 denoising: (a) shows the “TV-Tree” test image of size 400 x 296, which has 
been degraded by adding zero-mean Gaussian noise with standard deviation cr = 25/255. (b) 
shows the result of TV -^2 denoising using Wij = 0.1 for all i,j. (c) shows a comparison in terms 
of iterations and (d) shows a comparison in terms of CPU time. 
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We consider a given image / £ R mra which is defined on a regular 2D graph of in x n vertices 
(pixels). The model is written as the convex minimization problem 

min^ TV h (x) + TV v {x) + ^ J^(xj - /i) 2 , 

iev 

where V is the set of nodes (pixels), with |V| = mn. TV)j(x) and TV v (x) refer to the total 
variation in horizontal and vertical direction, which are given by 

TV k ^ v {x^) — ^ ^ 1Xij\Xi Xj |, 

(i,j)eE h , v 

where E k and E v correspond to the sets of vertical and horizontal edges defined on the 2D graph. 
Now, we perform a Lagrangian decomposition and rewrite the above problem as 

A {x,x',y) = TV h (x) + TV v (x') + ^||x - /|| 2 + (x - x’,y ), 

where y £ R mra is a Lagrange multiplier and (•, •) denotes the usual scalar product. We proceed 
by observing that the convex conjugate TV* of the function TV v is given by 

T K(y)= sup {y,x') ~ TV v {x'). 
x , eR rnn 

Substituting back in the Lagrangian yields the following saddle-point problem: 

min max (x, y) +TV h (x) + Lx - f\\ \ - TV*(y). (13) 

x y Z 

This problem can be solved by the first-order primal-dual algorithm proposed in |7j, which in 
our setting is given by 


= P ro x * k TV* ( y k + Vk(x k + 0 k (x k - x k 1 ))) 


yfc+1 _ 

v k+1 = P rox Tfc(T y h+ 1 || ._j || 2 ) (x fe - T k { y k+1 )) 


(14) 


where Tk,<Jk,0k are positive step size parameters such that T k <Jk = 1, 0 k £ (0,1]. Since the 
saddle-point problem is 1-strongly convex in the primal variable x, we can apply the accelerated 
variant of the primal-dual algorithm, ensuring an optimal 0(l/fc 2 ) convergence of the rimal-dual 


gap (see 0). Observe that since the linear operator in the bilinear term in (131 is the identity, 
the primal-dual algorithm is equivalent to an (accelerated) Douglas-Rachford splitting (see IT]). 

In order to make the primal-dual algorithm implementable, we need to efficiently compute 
the proximal maps with respect to the functions TVh + |||- — /1| 2 and TV*. It can be checked 
that the proximal map for the primal function is given by 


prox T(TVh+ i||._ / || 2) (0 = arg min TVh (x) + 


1 + T 


-(1 + t- 1 )- 1 (/ + t- 1 £)|| 2 , 


for some £ £ R mTl and r > 0. Its solution can be computed by solving m independent problems 
of the form (12). In order to compute the proximal map with respect to the dual function, we 


make use of the celebrated Moreau identity 


V = W ox aTV* (y) + v prox ff _i TK (cr x y) , 


(15) 
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which shows that the proximal map with respect to TV* can be computed by computing the 
proximal map with respect to TV v . 


ProXo-TV* (v) = V - a ' arg min TV v (x) + -\\x - a 1 r]\\i 


for some given rj £ 


of the form (12). According to Sec. 4.1 


and a > 0. The proximal map again reduces to n independent problems 
the total complexity for computing the proximal maps 


is 0{mn) and hence linear in the number of image pixels. Furthermore, the computation of the 
independent ID subproblems can be done fully in parallel. 

Fig. [| presents the results of a performance comparison between the proposed accelerated 
primal-dual algorithm by solving TV -^2 problems on chains (TV-Chains) and the state-of-the art 
primal-dual algorithm proposed in [7j which is based on a pointwise decomposition (TV-Points). 
Both algorithms were implemented in Matlab, while for TV-Chains, the solution of the ID 
subproblems was implemented in C+-K The figure shows that TV-Chains converges significantly 
faster than TV-Points both in terms of iterations and CPU time and hence significantly improves 
the state-of-the art (approx, one order of magnitude). 


5.2 TV-(] image restoration 

Next, we consider again total variation minimization but now with a l\ data fitting term. The 
minimization problem is given by 

min TV h (x) + TV v (x) + \xi - f z \. 

iev 


It is well-known that the TV-£i model performs significantly better compared to the TV-t^ 
model in presence of non-Gaussian noise. However, being a completely nonsmooth optimization 
problem it is also significantly more challenging to minimize. 

In order to apply the direct ID algorithms proposed in this paper to minimize the TV-£i 
model we consider a splitting in the same spirit as in the previous section. 

min max (x,y) +TV h (x ) + \\\x - /||i - TV*{y). 
x y z 


We solve the saddle-point problem again by using the primal-dual algorithm (14). To make the 
algorithm implementable, we need fast algorithms to solve the proximity operators with respect 
to both the primal and dual functions. The proximity operator with respect to the primal 
function is given by 


P rox T ( TVh+J|—/|b) (0 = argmmTI4(x) + \\x 



4111, 


(16) 


for some point £ £ R mn and r > 0. Computing this proximity operator reduces to minimizing 
m independent total variation problems subject to piecewise quadratic unaries. According to 
Sec. 4.2 one subproblem can be computed in 0(n log n) time. The proximity operator with 


respect to TV* is equivalent to the proximity operator in (15) and hence it reduces to n inde¬ 
pendent ID TV problems subject to quadratic unaries. Hence, the overall complexity for one 
iteration of the primal dual algorithm is 0(mn\ogn). 

We first evaluate the empirical complexity of the direct algorithm for minimizing the total 
variation with piecewise quadratic unaries which is used in (161 to compute the proximal map 
with respect to the ID TV-fi problems. For this we again consider a discretized sine function 
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Figure 7: TV-4 denoising of a ID sine function and a step function of length n containing 
zero-mean uniformly distributed noise with magnitude 1/2. (a) and (b) show in black the input 
signal for n = 10 3 and in blue the TV-14 regularized signal using Wij = 200 /n. (c) and (d) show 
the CPU times for different signal lengths n. One can see that the empirical complexity of the 
proposed direct algorithm for computing the proximal map with respect to the ID TV-4 model 
is between O(n) and 0(n\ogn). 


and a step function with different signal lengths n and we added zero-mean uniformly distributed 
noise with magnitude 1 /2 (see Fig@ The pairwise weights were set to Wij = 200 /n. We also set 
the quadratic part of the function close to zero (1CD 6 ) in order to be able to successfully restore 
the signal. From Fig. [TJ one can see that the empirical performance of the proposed algorithm 
for piecewise quadratic unaries is between O(n) and its worst case complexity of 0{n\ogn). We 
also compared with the O(rologlogn) direct algorithm for convex piecewise linear unaries and it 
turned out that the practical performance is about the same. 

Fig. § shows a comparison of the proposed primal-dual algorithm based on chains (TV- 
Chains) to the primal-dual algorithm based on a points-based splitting (TV-Points) [7]. Although 
theoretically not justified, we again used varying step sizes in case of TV-Chains to accelerate 
the convergence. For TV-Points, the acceleration scheme did not work. Both algorithms were 
again implemented in Matlab, while for TV-Chains, the solution of the proximal operators were 
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(c) 


(d) 


Figure 8: TV-i'i denoising: (a) shows the “TV-tree” test image of size 400 x 296, which has been 
degraded by 25% salt&pepper noise, (b) shows the result of TV-^i denoising using Wij = 0.55 
for all i,j. (c) shows the convergence rate in terms of iterations and (d) shows the convergence 
rate in terms of CPU time. 
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implemented in C++. The comparison shows that TV-Chains needs far less iterations compared 
to TV-Points and it is also significantly more efficient in terms of the CPU time (approx, one 
order of magnitude). 

5.3 TV-nonconvex 

Finally, we consider total variation minimization subject to nonconvex piecewise linear unaries. 
Such problems arise for example in stereo and optical flow estimation. The general form of the 
minimization problem we consider here is given by 

min^ V(x) = TV h c (x) + TV v c (x) + £ /<(:*), (17) 

iev 

where f t are continuous piecewise linear functions, which are defined by a set of t + 1 slopes 
( s i)i= o> and a corresponding set of t break-points (AfcOf=i- '^'^hvi x ) re f ers to truncated total 
variation defined by 

TV h,v (*) = X] Wi i ' \ x i ~ x o l)> ( 18 ) 

where Wij are edge weights and C is some positive constant. Observe that convex total variation 
is obtained for C = oo. We again perform a splitting into horizontal and vertical ID problems 
and consider the Lagrangian 

min max ^(x/,) + \£„(x„) + (x h - x v ,y). 
xh,v y 

where 

Vh,v( x ) = TV hA x ) +1 

iev 

The reason for splitting the nonconvex term into two parts is that there is a higher chance that 
part of the nonconvexity are absorbed by the convexity of the regularization terms. Observe that 
while the problem is nonconvex in Xh and x v it is concave in y since it is a pointwise maximum 
over linear functions. 

In contrast to the application of the convex conjugate utilized in the two previous examples, 
we consider here a direct application of the primal-dual algorithm |7j to the Lagrangian function. 
The algorithm takes the following form: 

f x h +1 =P* ox T k * h ( x h ~ T kV k ) 

I x^ +1 = prox Tfc ^ v (x k v + r k y k ) 

\y k+1 = y k + a k {x k + 1 - x k + l ) 
yy k+1 = y k+1 + 9 k (y k+1 - y k ). 

The proximal maps with respect to the nonconvex functions are computed by adding a 
piecewise linear approximation of the quadratic proximity term p^||- — to the functions 

4+ „ and solving the resulting independent ID problems using the direct algorithm for minimizing 
the (truncated) total variation subject to (nonconvex) piecewise linear unaries which has been 
presented in Sec. [3] 

Due to the nonconvexity in the primal objective, the primal-dual algorithm is not guaranteed 
to converge. However, we observe convergence when gradually decreasing the step size parameter 
r k during the iterations. The intuition behind this strategy is that by gradually decreasing the 
primal step size, the primal-dual algorithm approaches a (regularized) dual algorithm, applied 
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to the (concave) dual objective. We found that the rule Tk = T 0 /fc, r 0 ~ 100... 1000 works well in 
practice. The dual step size is set to <Jk = 1 /(t^L 2 ), where the Lipschitz constant L is computed 
as L = v2- The relaxation parameter 6 k is constantly set to 6k = 1. 

We applied problem |l7| to disparity estimation in stereo images. The stereo image pair is 
the “Motorcycle” data set of size 1000 x 1482 pixels, which is taken from the recently introduced 
Middlebury stereo data set [2H] (see Fig. |9|). The stereo data term (piecewise linear functions 
Si in @) and the edge weights (w,; ? in (|18|))are set identically to the stereo experiment de¬ 
scribed in [9(. The piecewise linear matching function is computed using 126 break points, which 
corresponds to a disparity range of [0,125]. 



Figure 9: “Motorcycle” stereo data set used in the experiment, (a) Shows the left input image 
of size 1000 x 1482 and (b) shows the color coded ground truth disparity map. 


In the first experiment, we evaluate the practical performance of our proposed dynamic 
programming algorithms for minimizing the convex and nonconvex total variation subject to 
nonconvex piecewise linear unaries. For this, we consider different sizes of the stereo image pair 
and recorded the average time of computing the solutions of the horizontal lines during the 
first iteration of the algorithm. The number of break points is kept constant in all problems. 
The worst-case complexity for solving one problem of size n is 0(n 2 ) in case of convex total 
variation and exponential in case of nonconvex truncated total variation. The resulting timings 
are presented in Fig. [To] One can clearly see that the practical performance of the algorithm is 
significantly better than the theoretical worst-case complexities (see Sec. [3]). 

In our second experiment, we conduct exactly the same stereo experiment as in [S|. However, 
instead of computing the globally optimal solution by means of a lifting approach in 3D, we 
directly solve the nonconvex 2D Lagrangian problem. We use either convex total variation (TV) 
or truncated total variation (TTV) where we set the truncation value to be C = 10. The primal 
variables x^.v are initialized by the solutions of the ID problems (assuming no coupling between 
the horizontal and vertical chains). 

Fig. 11 shows a comparison between our proposed Lagrangian decomposition and the globally 
optimal solution obtained from [9]. Observe that the primal energy of the Lagrangian decom¬ 
position method quickly decreases during the first iterations. Suprisingly, we can approach the 
lower bound up to a very small error after a larger number of iterations. We also plot the color 
coded disparity maps corresponding to the average solution x k = {x k +x k )/2. While the solution 
after the first iteration still shows some streaking artifacts, the solution obtained after only 10 
iterations is visually almost identical to the globally optimal solution. 

Fig. [12] finally shows a comparison between convex TV and nonconvex truncated TV using 
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n [size] 


Figure 10: Evaluation of the practical performance of the proposed dynamic programming algo¬ 
rithm for minimizing ID total variation subject to nonconvex piecewise linear unaries and using 
different sizes n. The practical performance for both convex total variation (TV) and nonconvex 
truncated total variation (TTV) is significantly better compared to the theoretical worst-case 
complexities presented in Sec. [3] 


a truncation value of C = 10. We also decreased the strength of the data term by a factor of 
two in order to account for the less strong regularization of the TTV. One can see that the TTV 
solution yields sharper discontinuities (for example at the front wheel) and also preserves smaller 
details (fork tubes). It is also a bit more sensitive for outliers in the solution, which however can 
be removed by some post-processing procedure. 

6 Conclusion 

In this paper we proposed dynamic programming algorithms for minimizing the total variation 
subject to different pointwise data terms on trees. We considered general nonconvex piecewise 
linear data terms, convex piecewise linear (and quadratic) data terms and convex quadratic data 
terms. 

In case of quadratic data terms, the resulting dynamic programming algorithm has a linear 
complexity in the signal length and can be seen as a generalization of Johnson’s method [53J to 
weighted total variation. In case of convex piecewise linear data terms, our dynamic program¬ 
ming algorithm has a worst case complexity of 0(n log log n) which improves the currently best 
performing algorithm }14| . In case of convex piecewise quadratic unaries we obtain an algorithm 
with a slightly worse complexity of 0(n log n) but this algorithm turns out to be useful for com¬ 
puting proximity operators with respect to ID TV-^i energies. Finally, in case of nonconvex 
piecewise linear unaries, we obtain a worst-case complexity of 0(n 2 ) which turns out to be useful 
for approximately solving 2D stereo problems. We evaluated the dynamic programming algo¬ 
rithms by utilizing them as basic building blocks in primal-dual block decomposition algorithms 
for minimizing 2D total variation models. Our numerical experiments show the efficiency of the 
proposed algorithms. 

In our block decomposition algorithms all ID subproblems can be solved simultaneously, 
which clearly offers a lot of potential for parallelization and hence a speedup of the algorithms. 
We will pursue this direction in our future work. 
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Iterations 

(a) Convergence (b) Globally optimal [9] 




(c) Lagrangian, k = 1 (d) Lagrangian, k = 10 



(e) Lagrangian, k = 50 (f) Lagrangian, k = 100 


Figure 11: 2D Lagrangian decomposition vs. globally optimal solution obtained from a 3D 
lifting [5]. (a) shows the decrease of the primal energy during the iterations of the primal-dual 
algorithm compared to the lower bound obtained from the globally optimal solution, (b) is the 
color coded disparity image from the global solution, (c)-(f) are the disparity images x k obtained 
from the proposed Lagrangian decomposition after k = 1, k = 10, k = 50, and k = 100 iterations. 
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(a) TV 


(c) TV-Detail 


(b) TTV, C m 10 



(d) GT-Detail 


(e) TTV-Detail 


Figure 12: Qualitative comparions between convex total variation (TV) vs. nonconvex truncated 
total variation (TTV). TTV beds to sharper discontinuities in the solution and better preserves 
small details (see the detail views). 


A Proof of Proposition [2] 

It suffices to prove the claim for m = 2; the general claim will then follow by induction. Denote 
s = h (g> g; we need to show that h 2 = s(y) for all i/gR, where h 1 = h<S> g 1 and h 2 = h 1 <8> g 2 - 

Proof of h 2 (y) < s(y) First, observe that h 2 (z ) < h 1 (z) < h(z) for any z since g 1 (0) = g 2 { 0) = 
0. For any x we have 

h 2 (y) < h 1 (y) < h(x) + g 1 (y - x) 

h 2 {y) < h l {x)+g 2 (y-x) = h(x) + g 2 (y - x) 

and so 

h 2 {y) < h(x) + min g k (y - x) = h(x) + g(y - x) 

fce{ 1 , 2 } 

Therefore, h 2 (y) < rma x \h{x) + g(y — x)\ = s(y). 
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Proof of h 2 (y) > s(y) For any x,x' we have 

s(y) < h{x)+g(y-x) < h(x) + [g{x' - x) + g{y - x')] 

< h(x) +g 1 {x' - x) + g 2 (y - x') 

where the second inequality holds by the triangle inequality ([9]). Therefore, 

s(y) < mm[[h(x) + g 1 (x'- x)]+g 2 (y - x')\ 

x,x' 

= minf/i 1 ^') +g 2 (y - x')] = h 2 (y) 

x' 

Mapping n It remains to prove that mapping tt = tt 1 o it 2 corresponds to min-convolution 
h 2 = h® g. Let x’ = t r 2 (y) and x = tt 1 ^’), then h 2 (y) = h 1 ( x') + g 2 (y — x') 

h 2 (y) = h2{x')+g 2 (y-x') 

= [H%) + g 1 {x' - z)] +g 2 (y~- x') 

> h{x) + g{x' -x) + g{y - x') > h(x) + g(y - x) 

(the last inequality is again by ([9])). The inequality h(x) + g(y — x) < h 2 (y) means that x £ 
argmin X [h(x) + g(y - a;)]. 

B Proof of Proposition [3] 

Suppose that we are given positive numbers b±,... ,bw- We will construct a chain instance with 
n = 2N nodes such that values will give the input numbers in a sorted order. This will imply 
the claim since sorting requires at least fl(n\ogn) comparisons in the worst case [IT .. 

The unary terms for nodes i = 1,... ,N nodes are given by gi{z) = < ' — '. The unary 

I 1 if z > bi 

terms for nodes i = N + 1,... ,n are zeros. The weights for edges (i,j) £ E are set as follows: 

uj-j = 0 for all edges, tot = TV + 1 if i < N, and = 2TV — i — \ if i > N. 

Let us sort numbers bi ,..., 6jv in the non-decreasing order, and denote the resulting sequence 
as (ci,..., cat). It can be checked that A+- = C 2 N-i for * = IV,, 2 N — 1. 

C Proof of Proposition [4] 

First note that each problem is reduced to solving a finite amount of subproblems of the same 
type (recall that both Vo and V\ are unions of chains). Let P® stand for the original problem 
and inductively for i > 1, let P\ , ..., P£. be all the (direct) subproblems of the problems P-f -1 , 
..., Pj.~ 1 1 . Also, let A(P), n(P) be the number of breakpoints and the number of nodes in 
subproblem P, respectively. It follows by induction on i that: 

• 12k M-Pfc) < |A| for each i > 0. 

• 12k n (Pk) — n f° r eac ^ ' L — d. 

• A (PI) < |A|/2* for each i > 0 and k > 1 and hence i is O(logn). 


29 


Since the complexity of dividing problem P into subproblems is 0(\(P) +n(P)) as described 
in Section [4] we now compute the total complexity as 


Y,J2°^ p k)+ n ( p k)) = E A «)+ n «) 

i k i \ k / 

= ^O(n) C O(nlogn). 

i 

D Proof of Proposition [6] 

We use the same strategy to compute the complexity of the recursive algorithm as in Section 
[c| Let Pi be the original problem and inductively for i > 1 , let P, P' k be all the (direct) 
subproblems of the problems P( _1 , ..., P^” 1 . Let A(P), U(P), and £{P) be the sizes of the sets 
A, U , and £ of problem P, respectively. 

We claim that: 


• J2k^( P k) i s 0{\U\) for each i > 0. 

• J2k^( P k) i s 0(|f|) for each i > 0. 

• A ( p l) < |A| • (3/4)* for each i > 0 and k > 1 and hence i is O(logn). 

Since the total number of added vertices (or cut edges) is at most \U\ (or \£|), the first 
two statements follow immediately from induction on i. For the third statement recall that 
the value A was chosen so that no more than 3/4 breakpoints can end up in any of the sets 
A + = {A' € A : A' > A}, A - = {A' £ A : A' < A}. Since for each subproblem the set of 
breakpoints is a subset of one of A + , A - the third statement follows again from induction on i. 

Since the complexity of dividing problem P into subproblems is 0((U(P) +£(P)) logm), we 
compute the entire complexity as 

^^0((W(P*)+f(P*))logm) 

i k 

= ^o[^2u{Pl)+£(P^ log 
= ^ 0((\U\ + \£\) logm) C 0{n\ogn\ogn). 
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